Fully-online construction of suffix trees and DAWGs for multiple texts
نویسندگان
چکیده
We consider fully-online construction of indexing data structures for multiple texts. Let T = {T1, . . . , TK} be a collection of texts. By fully-online, we mean that a new character can be appended to any text in T at any time. This is a natural generalization of semi-online construction of indexing data structures for multiple texts in which, after a new character is appended to the kth text Tk, then its previous texts T1, . . . , Tk−1 will remain static. Our fully-online scenario arises when we index multi-sensor data. We propose fully-online algorithms which construct the directed acyclic word graph (DAWG) and the generalized suffix tree (GST ) for T in O(N log σ) time and O(N) space, where N and σ denote the total length of texts in T and the alphabet size, respectively.
منابع مشابه
Fully-online Construction of Suffix Trees for Multiple Texts
We consider fully-online construction of indexing data structures for multiple texts. Let T = {T1, . . . , TK} be a collection of texts. By fully-online, we mean that a new character can be appended to any text in T at any time. This is a natural generalization of semi-online construction of indexing data structures for multiple texts in which, after a new character is appended to the kth text ...
متن کاملBidirectional Construction of Suffix Trees
String matching is critical in information retrieval since in many cases information is stored and manipulated as strings. Constructing and utilizing a suitable data structure for a text string, we can solve the string matching problem efficiently. Such a structure is called an index structure. Suffix trees are certainly the most widely-known and extensively-studied structure of this kind. In t...
متن کاملSparse Directed Acyclic Word Graphs
The suffix tree of string w is a text indexing structure that represents all suffixes ofw. A sparse suffix tree ofw represents only a subset of suffixes of w. An application to sparse suffix trees is composite pattern discovery from biological sequences. In this paper, we introduce a new data structure named sparse directed acyclic word graphs (SDAWGs), which are a sparse text indexing version ...
متن کاملOn – line construction of suffix trees 1
An on–line algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string. The new algorithm has the desirable property of processing the string symbol by symbol from left to right. It has always the suffix tree for the scanned part of the string ready. The method is developed as a linear–time version of a very simple algorithm for (quadrat...
متن کاملString Processing Algorithms
The thesis describes extensive studies on various algorithms for efficient string processing. Data available in/via computers are often of enormous size, and thus, it is significantly important and necessary to invent timeand space-efficient methods to process them. Most of such data are, in fact, stored and manipulated as strings. String matching is most fundamental in string processing, where...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1507.07622 شماره
صفحات -
تاریخ انتشار 2015